1 ID code consistency between .fam and RRB phenotypes

  • The .fam IDs and RRB phenotype IDs are not consistent
  • A match list can be found in microarray/Microarray_clean.txt
  • Then update RRB phenotypes based on .fam IDs


2 Structure of three SSC chips

  • 10220 Individuals from 2591 SSC families were genotyped on three chips.
  • Note that members of each family were analyzed on the same array.

2.1 Illumina 1Mv3

  • 1189 families
  • 4626 people (2703 males, 1923 females)
  • 1199033 SNPs


2.2 Illumina 1Mv1

  • 333 families
  • 1354 people (801 males, 553 females)
  • 1072814 SNPs


2.3 Illumina HumanOmni2.5M

  • 1069 families
  • 4240 people (2490 males, 1750 females)
  • 2440283 SNPs


3 RRB phenotypes

3.1 Phenotype discription

  • phenotype counts and levels
  • primary and secondary variable information for probands

3.1.1 Probands (2588 individuals)


  • 56 phenotypes including age, sex and race


3.1.2 Unaffected Siblings (2098 individuals)


  • 27 phenotypes including sex


3.1.3 Other Siblings (296 individuals)


  • 26 phenotypes


3.2 Phenotype distributions

3.2.1 Probands



3.2.2 Unaffected Siblings



3.2.3 Other Siblings



3.3 Shared phenotypes

  • Proband VS Unaffected Siblings: 15
  • Proband VS Other Siblings: 14
  • Unaffected Siblings VS Other Siblings: 26
  • Proband VS Unaffected Siblings VS Other Siblings: 14

3.3.1 Shared phenotype summary



3.3.2 Shared phenotype distribution

3.3.2.1 Probands VS Unaffected Siblings

***


3.3.2.2 Probands VS Other Siblings


3.3.2.3 Probands VS Siblings



4 Check sex information

  • using plink --check-sex option based on after-QC genotypes in impute_pipe
  • genotype QC: --geno 0.05 --hwe 1e-6 --mind 0.1 --maf 0.01
  • Note remove PAR regions (CHR 25 and .hh), only CHR 23 is used
  • >0.8 as male (coded as 1), < 0.2 as female (coded as 2)
  • 11 PROBLEM individuals included in RRB phenotype files
    • 10 consistent with phenotype files of unaffected siblings
    • 1 with no sex information from other siblings (11712 4584699075_R01C02 2 0 PROBLEM 0.7319 UCSF_1Mv3 11712.s2)


4.1 Mismatch summary




4.2 Chr-X F distributions

4.2.1 Illumina 1Mv3



  • 31 PROBLEM

4.2.2 Illumina 1Mv1



  • 12 PROBLEM

4.2.3 Illumina Omni2.5M



  • 9 PROBLEM



5 Pairwise IBD estimation

  • Using plink –genome rel-check based on genome-wide QCd genotypes
  • Genotype QC: –geno 0.05 –hwe 1e-6 –mind 0.1 –maf 0.01; further pruned on the SNPs --indep-pairwise to prune in ~50K SNPs
  • Only individuals within same family is checked
  • Relationships (RT): OT (Parents), FS (Full Siblings), PO (Parent Offspring)

5.1 Estimated pairwise IBD distributions

5.1.1 Illumina 1Mv3



5.1.2 Illumina 1Mv1



5.1.3 Illumina Omni2.5



5.2 Estimated pairwise IBD VS. Chr-X F

5.2.1 Illumina 1Mv3



5.2.2 Illumina 1Mv1



5.2.3 Illumina Omni2.5





6 Individual genome-wide heterozygosity

  • Using --het to calculate genome-wide (using pruned SNPs) heterozygosity
  • Mean heterozygosity = (N-O/N); Het_mean <- (N.NM. - O.HOM.)/N.NM.
  • Using --missing to calculate missing rates (individuals with missing rates > 0.1 will be removed)

6.1 Genome-wide heterozygosity VS missing rates



  • Grey horizontal line is y = mean +/- 3SD
  • Red horizontal line is y = mean +/- 5SD
  • Grey vertical line is x = 0.1


6.2 Genome-wide heterozygosity VS IBD estimation (PI_HAT)

6.2.1 Illumina 1Mv3



6.2.2 Illumina 1Mv1



6.2.3 Illumina Omni2.5